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Abstract 

Associate to each sequence A of integers (intending to represent 
packet IDs) a sequence of positive integers of the same length Ai{A). 
The i'th entry of A4{A) is the size (at time i) of the smallest buffer 
needed to hold out-of-order packets, where space is accounted for un- 
received packets as well. Call two sequences A, B equivalent (written 
A=FB B) UMiA) =M{B). 

We prove the following result: any two permutations A,B oi the 
same length with SUS{A), SUS{B) < 3 (where SUS is the shuffled- 
up-sequences reordering measure [3]), and such that A =fb B are 
identical. 

The result (which is no longer valid if we replace the upper bound 
3 by 4) was motivated by Restored, a receiver-oriented model of 
network traffic we introduced in f^. 
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1 Introduction 



The TCP protocol [15\ is the fundamental protocol for computer communi- 
cations. TCP breaks the information into packets, and attempts to maintain 
a ordered packet sequence to be passed to the application layer. It accom- 
plishes this by buffering packets that arrive out-of-order. 

Recent work in the area of network traffic modeling has brought to at- 
tention the significant impact of packet reordering on the dynamics of this 
protocol P, [21 [To]. This has stimulated research (mainly applied, rather 
than mathematical) on measuring and modeling reordering [TD, [12], and on 
quantifying the impact of packet reordering on application performance. 

In this paper we study a combinatorial problem motivated by modeling 
packet reordering in large TCP traces: suppose that we map a sequence A of 
packet IDs into the sequence of integers Ai (A) representing the different sizes 
of the buffer space necessary to store the out-of-order packets; we assume that 
space in the buffer is reserved (and accounted) for unreceived out-of-order 
packets as well. What kind of additional information on the sequence A is 
needed to uniquely identify A given ^^(^4) ? 

The problem arose in the context of Restored [7] , a method for receiver- 
oriented modeling and compression of large TCP traces. Previously we 
showed experimentally [7] that Restored is able to regenerate sequences 
similar to the original sequences with respect to several reordering metrics. 
One of these metrics was the reorder density (RD) from [HJ [13]. For RD 
the experimental result is somewhat counterintuitive since 

1. Restored generates sequence that are (locally) similar (with respect 
to mapping JH to the original sequence. 

2. RD can take different values on sequences that map to the same se- 
quence via A^. 

Because of this latter property, the fact that the reconstructed sequences 
have similar properties with respect to the original sequence does not follow 
from the theoretical guarantee 1). 

The result in this paper, together with the experimental observation 
that over 99% of the traces we previously considered for benchmarking RE- 
STORED obey the constraint present in our result, explains why the theoret- 
ical inconsistence of RD is not observed in the "real- world" data we employed 
to benchmark RESTORED. 
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2 Preliminaries 



We first give a brief primer on tlie relevant aspect of the TCP protocol, 
Restored and the concepts used in the sequel. 

The TCP protocol [15j attempts to maintain an ordered stream of data 
bytes, identified by an integer called byte ID, that is effectively communi- 
cated through the network by breaking it down into packets. The ordering 
is maintained by buffering out-of-order packets. The dynamics of the buffer 
can be described in part using several parameters. 

1. The first parameter is NextByteExpected, and is the smallest index of 
a data byte that has still not been received by the receiver. 

2. A second, related, parameter is LastByteRead, the index of the last byte 
processed by the receiver-side application that communicates through 
the network via the TCP protocol. Throughout this paper we will make 
the simplifying assumption that data is read by the application as soon 
as it is ready. In other words NextByteExpected = LastByteRead-|-l. 

3. Another parameter is LastByteRcvd, the index of the last byte that has 
arrived at the receiver, awaiting processing. 

4. RcvWindow, the size of the receiver window, is a receiver-maintained 
parameter that is meant to provide the sender an estimate of the avail- 
able buffer space at the receiver. 

5. Finally, RcvBuff'er is a implement at ion- dependent system constant, the 
size of the receiving buffer. 

The functioning of the TCP protocol ensures that these four parameters 
are related through the relation ([9] section 3.5): 

RcvWindow = RcvBuffer — [LastByteRcvd — LaxtByteRead] . (1) 

The term in parantheses on the right-hand side is the actual size of the 
TCP receiver-buffer. The measurement takes into account space reserved 
(but not necessarily used) for all packets from the first expected to the last 
arrived. This is, of course, proportional to the buffer size measured in packets 
rather than bytes if it is the case that all packets have the same size. 
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TCP is receiver- driven: that is, the receiver attempts to maintain control 
on the sender flow stream by directing the sender speed, and acknowledging 
the received packets. An acknowledgment (shortly, ACK) generally consists 
of the ID of the first packet that has not yet been received. Acknowledgement 
mechanisms vary from implementation to implementation, and can entail 
delayed or selective acknowledgments, urgent retransmission requests, etc. 
From our standpoint, what is important that we can associate a sequence of 
integer ACKs to every sequence of packet IDs, the sequence of ACKs that 
would be sent if the receiver would immediately acknowledge every packet 
received. 

Example 1 Consider the following hypothetical sequence of packet IDs: A = 
(4 3 2 1). Then the sequence of ACKS zs ACK (A) = {1 1 1 5). 

Restored [7j is Markovian model of large TCP traces that incorporates 
information on the dynamics of packet reordering. It can be used to pro- 
vide estimates of various measures of quality of service without making these 
measurements online, or storing the entire sequence. Rather, it first "com- 
presses" the trace into a small "sketch" that allows regeneration of a TCP 
trace with (hopefully) similar characteristics. If needed, we can then perform 
a large number of measurements on the regenerated trace. 

For the purposes of the present paper, a connection is simply a sequence 
of integers (packet IDs). Suppose that the receiver observes the following 
(hypothetical) packet stream 

1 2 3 6 5 7 4 8 9 10 12 13 14 11. 

In this example packets with IDs 4, 5, 6, 7, 12, 13, 14 and 11 arrive out of 
order. One can, consequently, classify the received packets into two cate- 
gories: those that can be immediately passed to the application layer, and 
those that have to be temporarily stored before delivery. In the example, 
packets 5, 6, and 7 are temporarily buffered, and the buffer is only flushed 
when packet 4 is received. Similarly, packets 12, 13, and 14 are temporarily 
buffered, and the buffer is flushed when packet 11 arrives. We will call a 
packet that marks the end of a sequence of consecutively buffered packets a 
pivot packet. Packets that are immediately delivered to the application layer 
are also trivially pivots. In our example this is the case for packets 1, 2, 3, 
4, 8, 9, 10 and 11. 
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The distinction we introduced effectively defines a coarsened representa- 
tion of the stream of packet IDs using two states: An ordered state O , in 
which packets arrive when they were supposed to, and an unordered state U 
in which there is reordering and buffering. Each occurrence of State O is 
followed by one or more occurrences of State lA. 

The dynamics of packet IDs in the ordered state is trivial by definition: 
in order, starting with the first expected packets. In |7j we dealt with the 
dynamics of packet IDs in the unordered state, and defined a many-to-one 
mapping A^, sending sequences of IDs into "sketches." 

Definition 1 Let A = {Ai, A2, . . . , An} be a sequence of packet IDs. We 
define the Ai as an operator that after receiving a packet Ai at time index i, 
outputs the difference between the highest ID (Hi) seen so far and the highest 
ID (Li) that could be uploaded. 

M{Ai) = H,-Li. (2) 

In other words, M. is the size of the smallest buffer large enough to store 
all packets that arrive out-of-order, where the definition of size accounts for 
reserving space for unreceived packets with intermediate IDs as well. The 
buffer sequence M. (P) associated with a sequence P of packet IDs is simply 
a time-series of M. values. 

Two sequences of packet IDs P and Q are full buffer (FB) equivalent 
(written P =fb Q) ^f M{P) = M{Q). 

Example 2 Lei A = (4 3 2 1). Then M{A) = {A 4 4 0). 

The mapping M. is many-to-one, but an inverse can be computed in 
polynomial time This was used in the regeneration algorithm, where in 
the unordered state we first sample a sketch S from the distribution of such 
sketches and then reconstruct a sequence of IDs that maps (via A^) to iS. 

Mapping Jvi provides a formal way to guarantee that the reconstructed 
sequences are locally "similar" to the original one. The formal notion of 
similarity has implication for the dynamics of the TCP protocol: 

Definition 2 Two packet sequences A, B are behaviorally equivalent if they 
yield the same sequence of ACKs. 
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Suppose now that a TCP implementation uses simple ACKs (as opposed 
to SACK), and acknowledges every single packet then two sequences that map 
(via Ai) to the same sequence are behaviorally equivalent P]. As the dynam- 
ics of the congestion window is receiver- driven, assuming identical network 
conditions for the two ACK sequences, the two traces can be regarded as 
"equivalent," from a receiver-oriented standpoint. 

We will also need a standard measure of disorder [3]. This measure is 
denoted by shuffled up-sequences (SUS) and is defined as follows: 

Definition 3 Given sequence of integers A denote by SUS{A) the minimum 
number of ascending subsequences into which we can partition A. 

For example, a sequence A = (6, 5, 8, 7, 10, 9, 12, 11, 4, 3, 2) has 

SUS(A) = ||{(6, 8, 10, 12), (5, 7, 9, 11), (4), (3), (2)}|| = 5, (3) 

where 115*11 denotes the cardinality of a set S. 

3 Main result 

In this section we will prove our main result: 

Theorem 1 Let A, B be permutations of length n with SUS{A), SUS{B) < 
3 such that A =fb B. Then A = B. 

Observation 1 The theorem is no longer true if we replace the condition 
with SUS{A), SUS{B) < 4. This is witnessed by sequences (4 3 2 1) 
and (4 2 3 1). Indeed A =fb B, since they both map to sequence 
(4 4 4 0). /n fact SUS{A) = 4, SUS{B) = 3. 

Proof. We consider the greedy algorithm for computing SUS displayed in 
Figured! The algorithm has been implicitly proved correct in pTj; the reason 
is parameter SUS was shown to be equal to another presortedness measure 
denoted by LDS, and defined as follows: 

Definition 4 Let A = (ai, 02, . . . , a„) be a sequence of nonnegative integers. 
LDS{A) is defined as the longest length of a decreasing subsequence tti^ > 
ttjj > . . . Oj^. (1 < ii < 12 < ■ ■ ■ < ij < n) of A. 
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Algorithm SUSGreedy(W) 

INPUT: a list W — (pi,p2, ■ ■ ■ ,Pn) of non- negative integers. 

let i = 1; 
let j = 1; 

let Li be the empty list; 
while {i < n){ 

add Pi to the first list Lt, 1 < t < j 

where it can be added while maintaining it sorted; 

if this is not possible 

{ 

j++; 

create new list Lj — {pi}; 

} 

} 

let u be the number of lists created by the algorithm; 
OUTPUT u = LDS{W) = SUS{W). 



Figure 1: Greedy Algorithm for computing SUS 
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With this definition it is easy to see that Algorithm [T] computes parameter 
LDS (to make the paper self-contained we reprove this result below). 

We now give a simple algorithm, displayed in Figure [2l that, given a 
sequence W of positive integers constructs (if possible) a permutation A of 
size n with SUS{A) < 3 such that A^(A) = W. The proof that the algorithm 
is correct will imply the uniqueness of sequence A. 

We prove the correctness of algorithm RECONSTRUCT in a couple of 
intermediate steps. The first two apply to a general sequence A (rather than 
one with SUS{A) < 3). 

Proposition 1 Suppose there exists a permutation n with = w. Then 

the following are true at any stage i > 1: 

1. For any j > 1, the last element added to list Lj is the maximum ele- 
ment in lists Lk, k > j. In particular the largest element of Li is the 
maximum element seen so far. 

2. If element x is the largest element seen up to stage i then x = ACKi + 
M,-l. 

Proof. Let i = 1. Statement 1. is clearly true. For the second statement, 
note that ACKi = 2 and A^i = Oifx=l (in-order packet) otherwise 
ACKi = l,Mi = X. 

Consider now the case i > 1. By the induction statement, the largest 
element seen so far (call it y) is the last element of Li and y = ACKi^i + 

Case 1: x is added to Li. By the definition x > y so x is the largest 
element seen so far. Moreover, since x is an out-of-order element we have 
ACKi = ACKi^i and Mi = Mi-i + x-y. 

Case 2: x is added to some other list Lj. If x is the first element 
of the new list then statement 1 follows immediately. Otherwise let z be the 
largest element of list Lj before adding x. Applying the induction hypothesis 
it follows that z is the largest element in lists Lk, k > j. But z < x (since we 
add X to list Lj). Thus x becomes the new largest element of lists Lk, k > j. 

As for the second statement, from the algorithm it follows that x < y so 
y is still the largest element seen so far. If the buffer size does not modify 
then the desired relation follows from y = ACKi_i + Aii-i (which holds by 
induction) and relations ACKi = ACKi_i and Aii = Aii-i. Otherwise the 
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Algorithm RECONSTRUCT 

INPUT: a list W — {wi, W2, . . . , Wn) of positive integers. 

let PACKET and ACK be integer vectors of size n; 
with all fields initially equal to —1; 
let LARGEST = 0; 
conventionally define 74Cii'[0] = 0; 
for = 1 to n){ 
if {wi < Wi-i){ 

PACKET[i\ = ACK[i - 1]; 

ACK[i]:=ACK[i-l]+(wi_i - Wi)] 

} 

else { 

ACK[i]=ACK[i-l]; 
if {wi < Wi-i) 

LARGEST:^ PACKET[i]:= LARGEST+(wi - Wi); 

} 

} 

for (z = 1 to n){ 
if {wi = 

let PACKET[i\ be the smallest positive integer 
not present among values PACKET[j\, 1 < j < i; 

} 

if (vector PACKET is a permutation of {1, . . . , n}) 

return PACKET; 
else 

return NO PERMUTATION EXISTS; 



Figure 2: Algorithm for reconstructing permutations from buffer sizes 
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buffer shrinks with size ACKi-ACKi_i, so Mi-i-Mi = ACKi-ACKi_ 
We infer the fact that 



ACKi^i + Mi-i - 1 = ACKi - [ACKi - ACKi^i) + Mi 
ACKi + {M, - Mi-i) + M^^i - 1 = ACKi + Mi-l. 



□ 



Corollary 3.1 Algorithm SUSGreedy correctly computes u = LDS{A) (which 
zs equal 'JJJ to SUS{A) ). 

Proof. Let B = ai^ > ai^ > . . . > Oi^^g^^j be a decreasing subsequence 
of W of maximum length, and let Li, L2, . . . , Lj be the lists created by the 
algorithm on input sequence A. Each list is increasing, so it contains at 
most one element from B. Therefore u > LDS{A). On the other hand, each 
element am set by the algorithm to a list Lk, k > 2 is smaller than some 
element an, n < m, set by the algorithm to line k — 1 (otherwise Om would 
be set to a list Lj, j < k). Applying this observation starting with the last 
element of list we create a decreasing sequence of length u. It follows that 
u < LDS{A), thus u = LDS{A). □ 



From now on we assume that there exists a permutation A with SU S{A) < 
3 such that A4{A) = w. We will run the algorithm SUSgreedy along algo- 
rithm RECONSTRUCT. First we give a simple corollary of Lemma [D 

Corollary 3.2 Suppose that Wi > Wi-i. Let y be the largest ID of a packet 
received in stages 1 to i — 1 and x he the ID of the new packet. Then 

X = y + {wi - Wi-i) 

and X is added by SUSgreedy to list Li. 

Next we deal with another possible case, the one when the buffer size 
shrinks: 
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Proposition 2 (a). Let packet ID x be added at stage i, and assume that 
Wi < tOj-i Then x = ACKi_i and all packets with indices at most 
ACKi-i + {wi-i — Wi — 1) have been received in the first i stages. 

(b). Suppose packet ID x is added by algorithm SUSgreedy to list L3. Then 
packet X falls into case (a) of this lemma. 

Proof. 

(a) . The fact that x = ACKi^i follows from the definition of parameter 

ACK and the fact that the buffer shrinks. The second relation follows 
from the fact that the buffer shrinks by exactly wi-i — Wi. 

(b) . Since x goes in list L3, at the time when added x is smaller than the 

last element in lists Li and L2. If x were larger than ACKi_i then 
the packet with index ACKi_i (which arrives sometimes after x does) 
could not be placed in lists Li, L2 or L3, making the sequence A require 
SUS{A) > 4, a contradiction. 

The other two relations follow from the definition of parameter ACKi. 

□ 



Finally, the correctness of the algorithm RECONSTRUCT (and the proof 
of Theorem [T]) follows easily: the correctness of the first for loop in algorithm 
RECONSTRUCT follows from Corollary 13.21 and Proposition [2J Moreover, if 
a packet ID x is set at stage i in the second for loop then it must correspond 
to adding x to list L2. Since list L2 is sorted, x is the smallest element that 
has not been set up to this stage. 

Assuming that permutation A in the preimage of w exists then algorithm 
RECONSTRUCT is going to output exactly A. Since A was chosen in an 
arbitrary manner, the uniqueness of A follows. 

□ 



4 Application to RESTORED 

The result we just proved allows the reinterpretation of results in [71 [5]. In 
that paper it was shown experimentally that RestoredIs able to recover 
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several measures of quality of service, among them the following metric [S]. 
For simplicity our version of the metric is adapted to the case of permutations 
(i.e. sequences with no repeats or packet losses): 

Definition 5 Reorder Density (RD). 

Consider an implementation- dependent parameter DT that is a positive 
integer or oo. Given a permutation vr we define the reorder density of n as 
the distribution 0/ displacements 7r[i] —i, restricted to those displacements in 
the range [—DT,DT]. 

We also need the following definition from [5]: 

Definition 6 A metric M is consistent with respect to =fb if for any two 

ID sequences A and B, 

A =FB B =^ M{A) = M{B). 

In other words, a consistent measure M takes equal values on equivalent 
sequences. 

Example 3 By equation (CP, every measure defined in terms of the time 
series of parameter RcwWindow (e.g. the average value of this parameter) is 
consistent with respect to =fb- 

In particular, since Restored (in the form used in [TJE]) guarantees that, 
on sequence A it will reconstruct a sequence R{A) such that R{A) =fb A, it 
is not really that surprising that Restored should be able to capture any 
metric consistent with respect to =fb- The reason that the experimental 
results from [7] were somewhat surprising is that RD is an example of an 
inconsistent measure according to the terminology of Definition El 



Observation 2 If A = (4 3 2 1) and B = (4 2 3 1) then the distribu- 

' -3 -1 1 3 ^ 
1/4 1/4 1/4 1/4 



tions of displacements are D{A) = i J Z^. ^. j and D{B) 



I , respectively. It is easy to see that, no matter how we 



-3 3 
[ 1/4 1/2 1/4 ^ 
set the parameter DT to either a positive integer or 00, the truncated versions 
of distributions D{A), D{B) are going to be different. Thus A =fb (B) but 
D[A) 7^ D{B), which means that measure RD is inconsistent independently 
of the value of threshold parameter DT. 
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However, Theorem [T] forces us to reevaluate this statement: since the 
vast majority of traces used in [7J had SUS < 3 the measure is "consistent 
in practice" (at least on this dataset). Theorem [1] also exposes a weakness of 
the encoding used in on "real-life" traces the extra potential compression 
given by the many-to-one nature of map FB is not present. 
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